Skip to main content

Transactions - Isolation Levels

#WeakIsolationLevels #weak #isolation

Types of Transactions levels:

  • read uncommited
  • read commited
  • snapshot isolation
  • repeatable reads

#readcommited

  • read commited means there are no dirty reads or writes.
  • its the most basic leve of transaction isolation

By #dirtyreads or #dirtywrites, it means we are never reading altered data which were not commited.(By aborted or rolled-back transactions)

The mechanisms by which we prevent dirty reads or writes, its by requiring the LOCK on the ROW for the current transaction, for a brief moment.

The cons:

  • acquiring locks may cause some real performance issues on long-running write operations. Since,we cannot release the lock until the write opeartion has finished.

To prevent this method of #dirtyreads most DB store the old commited value and the value from the current transaction holding the lock. Other transactions read the old commited value.

On #readcommited type of #isolation, there's something called #readskew which refers to an acceptable inconsistency, on the client side caused a timing issue. Reads in between of writes.

#snapshotisolation #mvcc #multiversionconcurrencycontrol

"readers never block writers, and writers never block readers"

snapshot isolations are used in case, you cannot have #readskews nor any inconsistencies on data, such as whole DB backups, and analytics.

  • creates a consistent snapshot of the whole database, with only data older than the current transaction. Thus, it doesnt require any read-lock. Only the write-locks.

This approach requires the ability to read the database on any timeline and actually writes new rows into the table with createdby and deleted_by columns. This columns refers to the transactionId(_txId). A garbage collections is then triggered.

#mvcc can be used for both #readcommited and #snapshotisolation the difference is that on read commited it uses a separate snapshot for each query, not the entire db.

#indexes on #mvcc #confusing

DB that use the B-tree style indexes use the #append-only or #copy-on-write variants. The append-only create new tree roots on each write, so that everything is consistent. and copy-on-write, the parents(all up to the root) are all copied.

#repeatablereads

Actually its the same as #snapshotisolation. only there is a naming confusion. Oracle calls it serializable and Postgres and mySQL call it repeatable reads.

"nobody really knows what a 'Repeatable Read' means"

#preventing #lost #updates

This kind of problem arises when we read a value from the db, modifies and write it back on different operations. (#read-modify-write cycle)

#atomicwriteoperations

UPDATE counters SET value = value + 1 WHERE key = 'foo';

This is a concurrency-safe operation.

For #MongoDB use the $inc.

We can use the #cursorstability or forcing all atomic operations to be execute on a single thread

To acquire a lock, until the update, look for the keyword SELECT … FROM … WHERE ... 'FOR UPDATE'.

#compare-and-set

Only allow an change in the value, if the last read value has not changed. If so, retry the cycle.

On replicas or partitions, how do we prevent lost updates, since neither locks or compare and set cannot be used?

  • specific data structures for conflict resolution
  • several conflicting version of the same document/value - #siblings

#writeskews

  • Reads on same object, apply logic and update both writes. Example of the on-call doctors. Which the query depends upon flags.
  • To solve, you can use an explicit lock with FOR UPDATE or triggers and materialized views

patterns for write skews.

Application logic on SELECT result to determine if, there is some update or insert, that will alter this SELECT result thus the application logic.

Phantoms are phenomens that occurs when we ADDING new items, not UPDATING. And follow the same logic as above.

#materializingconflicts is the approach of creating some table so we can acquire a lock, before some insert. As of, booking and booking periods. Availabilities its an example.

#serializable

this Serializable comes from 'in-series'.

Types of Serializable Isolation:

  • #serialexecution - not scalable
  • #twophaselocking #2pl - not performant
  • #serializablesnapshopisolation #ssi

the next considerations are only for single-node databases

#serialexecution consists on a single-thread loop executing all transactions in serial. On at a time. Only made feasabile by 2007 because of how much resources got cheaper and how #OLTP transactions are usually short and not demading.

The even better but infamous approach is to use stored procedures, which come with a lot of unwanted cons, such as lack of testability, difficulty to debug, version controlling and performance sensitiviness.

As of now, the vendors are actually using programming languages for its procedures, such as Redis with Lua. Serial execution does not fit well with partitions, since it require a lock on another replica/partition.

Serial execution can be used for

  • datasets that fit into memory
  • small and fast transactions
  • not many writes (??)

#twophaselocking #2pl

For almost 30 year it was the gold standard for serializability on dbs.

Several transactions are allowed to the read the same data, as long as nobody is writing to it. As soon as there is any write, the access is exclusive, thus blocking any other transaction read or write.

It's mantra is pretty different from snapshop isolation. "writes block writers and readers"

For 2PL we have two types of locking, the SHARED MODE or EXCLUSIVE MODE.

Performance of 2PL is significantly worse than Weak Isolation type caused by reduced concurrency and overhead of dealing with locks.

2PL supports #predicatelocks which prevents the #writeskews or any kind of #phantoms and #raceconditions

#serializablesnapshotisolation #ssi Created at 2008, SSI is very promising and seems to deliver good serializability with small performance penalty.

#pessimisticlock #optimisticlock #serialexecution #2pl #mutex

#2PL and serial execution are a pessimistic lock concurrency control mechanism, since it assumes that if anything can go wrong, its better to wait for a safe opportunity. just as MutEx are used on multi threaded data structures.

serializable Snapshot isolaton #ssi fits into the optimistic locking mechanis, which only hopes for the best LOL. And if there are any errors as outcome, it decides to abort or retry. If not, it is allowed to commit.

Summary

  • 2PL is not performant
  • serial execution is not scalable
  • weak isolation levels have a good performance, but are error prone to race conditions
  • SSI is stay in the middle, between serializable isolation and weak isolations.